A Hybrid Algorithm for the Equal Districting Problem 


Yunfeng Kong 1[0000-0002-0777-3 116] 


' Key Laboratory of Geospatial Technology for the Middle and Lower Yellow River Regions, Ministry of Educa- 
tion, Henan University, 475000, Kaifeng, China 
yfkong@henu.edu.cn 


Abstract. The equal districting problem (EDP) arises in applications such as political redistricting, police 
patrol area delineation, sales territory design and some service area design. The important criteria for these 
problems are district equality, contiguity and compactness. A mixed integer linear programming (MILP) 
model and a hybrid algorithm are proposed for the EDP. The hybrid algorithm is designed by extending 
iterative local search (ILS) algorithm with three schemes: population-based ILS, variable neighborhood 
descent (VND) local search, and set partitioning. The performance of the algorithm was tested on five 


areas. Experimentation showed that the instances could be solved effectively and efficiently. 
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1 Introduction 


Districting problems have been widely applied in geography, economics, environmental science, politics, 
business, public service and many other areas. The equal districting problem (EDP) arises in applications 
such as political redistricting, police patrol area delineation, sales territory design and some service area 
design. The basic criteria for the problem are balance, compactness and contiguity [1] [2]. 

There are two general approaches to solve the EDP: exact methods and heuristics [2] [3]. It is hard to 
solve the EDP by exact methods due to its computational complexity [4]. Salazaraguilar et al. (2011) 
proposed a bi-objective programming model and solved an instance with 150 units and six territories in 
four hours [5]. Duque et al. (2011) have shown that only very small instances of the p-regions problem 
with 49 units could be solved [6]. A new model proposed by Plane et al. (2019) was used to solve political 
districting instances with 25~36 units in 6~53,632 seconds [7]. In practice, some districting criteria were 
not fully considered in model formulations [3]. 

Early studies had also modelled the problem as a capacitated p-median facility location problem, and 
developed several location-based solution approaches [8][9][10][11]. It is not necessary to determine 
district centers in solving the districting problem; however, districting measures such as contiguity and 
compactness rely on district centers [2]. Kong et al. (2019) proposed a center-based model for solving 
the districting problem, in which the district centers were identified by a weighted K-medoids algorithm 
[12]. 

Three classes of heuristic methods were developed for districting problems: the local-search based 
heuristics such as greedy search, simulated annulling, tabu search, and old bachelor acceptance search 
[1][13][14][15][16][17]; evolutionary methods such as evolutionary algorithm, genetic algorithm, and 
scatter search[18][19][20][21]; and swarm intelligence methods such as artificial bee colony[22]. How- 
ever, the high-performance methods are always time consuming. For example, the PEAR algorithm [23] 


needs 3 hours to solve an instance on a super computer with 131K processors. 


This article aims to propose a high-performance hybrid algorithm for the EDP. The hybrid algorithm 
is designed by extending iterative local search (ILS) algorithm with three schemes: population-based 
ILS, variable neighborhood descent (VND) local search, and set partitioning. The performance of the 
algorithm was tested on five areas. Experimentation showed that the instances could be solved effectively 


and efficiently. 


2 Mathematical formulation 


For a geographic area, let set V={1, 2 ... n} denote n basic areal units. Each unit 7 has an attribute value 
pi. Let cj indicate whether units 7 and j share a border and N; be a set of units that are adjacent to unit i 
(Ni = {j | cy =1}). Let variables dj be the distance between units i and j. The districting problem in this 
article is to partition set V into K eaqual, compact and contiguous districts. The EDP is to delineate a 
geographic area into K equal, compact and contiguous districts. A mixed integer linear programming 
(MILP) model is formulated as follows. 


Min. X iev Xrev PidixXix / Liev Pi (1) 
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The objective function (1) is to minimize the average travel distance from basic units to their 
district centers. Constraints (2) ensure that each unit į is assigned to only one district. Constraints 
(3) confirm that districts are almost equal with a predefined error £, e.g. €=5%. Note that Q = 
Niev p;/K. Constraint (4) requests that the area is divided into K regions. The district contiguity is ensured 
by constraints (5), (6) and (7). It is a variant of the network-flow based model [6] to guarantee the district 
contiguity. Constraints (5) and (6) ensure that a flow may only be passed through neighbor units in 
the same district. If unit i does not serve as the center unit, constraints (7) state that one unit of flow 


must be created from this unit. Constraints (8), (9) and (10) define the decision variables. 


3 Algorithm design 


A hybrid algorithm for the EDP was proposed based on iterative local search (ILS) algorithm. ILS is a 
simple, easy to implement, and quite effective metaheuristic for discrete optimization problems. It starts 
from an initial solution and iteratively improves the solution by local search and perturbation. In this 


article, the standard ILS algorithm was enhanced by three schemes: population-based search, variable 


neighborhood descent (VND) search, and set partitioning. Given parameters such as population size 
(psize), perturbation strength (strength), number of consecutive loops that the best solution is not updated 
(mloops), time limit for set partitioning (¢/imit), the hybrid algorithm is outlined as follows: 

1. P = GenerateInitialSolutions (psize) ; 


pool=null; 


Sbest=Best (P); 


notImpr=0; 


Select a solution s from P randomly; 


2 

3 

4 

5. While notImpr < mloops: 

6 

7 s'=Perturbation(s, strength) ; 
8 


s"=VNDsearch(s’') ; 


9. s*=updateCenetrs (s") 

10. If £(s")<f(Spbest) 1 Sbest=s*,notImpr=0; 
il. else: notImprt+=1; 

12; pool=UpdateAreaPool (pool, s*); 

13. P=UpdatePopulation(P,s"*); 


14. s=SetPartitioning (pool,t); 
15. Output s. 

The hybrid algorithm maintains a number of candidate solutions. In step (1), initial solutions for the 
EDP are constructed by a weighted K-medoids algorithm [12]. In step (6), a solution is randomly selected 
from the population as the incumbent solution. After solution perturbation and local search, in step (13), 
the population is updated according to the solution objectives and the similarities among the solutions. 
The population-based extension of ILS may enhance the diversification of local search. The elite and 
diverse solutions are selected by the algorithm. 

The design of local search operators in optimization algorithm is critical for repeatedly improving the 
incumbent solution. The one-unit shift is a widely used operator that moves a boundary unit to its neigh- 
boring district while maintaining the connectivity of its original district. Butsch et al. introduced three 
operators for districting problems: shift, double shift, and swap [24]. In the proposed algorithm, two local 
search operators are used to improve the solutions. Instead of simple local search in ILS, VND search is 
used in step (8). VND is repeatedly exploring different neighborhood structures by the change of neigh- 
borhoods until the solution cannot be improvement. Using VND search, ILS algorithm can easily find 
solutions in local optimum. 

Perturbation is one of the key components of ILS algorithm. Using local search operators, the solution 
may easily reach to local optima. A ruin and recreate procedure is useful to perturb the EDP solution 
from local optima. There are multiple methods to ruin the solution such as deleting some boundary units, 
deleting all the units in some districts, and deleting some units in a connective region. The ruined solution 
must be repaired to be a feasible solution. 

In step (9), the district centers are updated by selecting the best unit in each district. 

In step (14), the best solution might be improved by a set partitioning procedure. In each ILS loop, all 
the districts in solution s* are recorded in the list pool. Each district record includes the units in the 
district and its objective value. At the end of ILS, thousands or more districts will be recorded in the 
pool. A set partitioning problem (SPP) model could be used to select a better solution from the candidate 


service areas. Let Q be the set of districts identified in step (9), U; and O; be the set of basic units in each 
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district 7 and its objective, respectively. A SPP model is used to select K districts from 2. Let Q; be a 
subset of Q, 0; = {ili E€ 2,7 € U;}, the SPP model is formulated as follows: 


Min. Fico Oi Xi (1 1) 

S.T. Dien; x, =1,VjEV (12) 
dien Xi = K (13) 

x, = {0,1}, vie 0 (14) 


4 Experimentation 


Five geographic areas (Fig. 1) are selected to test the EDP. The main attributes of the areas are summa- 
rized in Table 1. 


fee apes 


te era 
Wee es 


Fig. 1. The study areas 
Table 1. Main attributes of the study areas 


Area HD GY ZY GY2 HN 
Geography Res. community County Urban County Province 
No. of basic units 28 297 324 1,276 2,145 
Sum of attribute 2,420 36,824 3,873 819,812 96,918,036 
value 


The proposed algorithm was implemented by using the Python programming language, and was 
executed in PyPy 7.0, a fast and compliant implementation of the Python language (www.pypy.org). The 
variables pi, cj and dj for each area were prepared in ArcGIS 10 (www.esri.com). All the software ran 
on a desktop computer with Intel Core 17-6700 CPU 3.40-GHz, 8 GB RAM and the Windows 10 oper- 
ating system. 

The problem instances were solved by exact and heuristic methods, respectively. First, each in- 


stance model was solve by CPLEX Optimizer 12.6. A solution with lower bound (LB), upper bound 
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(UB), gap between LB and UB (MIPGap) and computation time (Time) could be obtained from CPLEX 
optimizer. Note that only the instances on small and medium areas HD, GY and ZY can be solved by 
CPLEX. Second, each instance was repeatedly solved by the hybrid algorithm for ten times. The param- 
eters were set as follows: e=5%, psize=10, strength=5%, mloops=100, and tlimit=100 second. The heu- 
ristic solutions are evaluated in terms of the average gap to lower bound (Gap), the standard deviation 
among 10 solutions (Dev) and the average computation time (Time). 

Districting results from areas HD, GY and ZY are summarized in Table 2. The solution results show 
that all the instances can be optimally solved by CPLEX optimizer; and the solutions from the hybrid 
algorithm approximate to the optimal solutions or the lower bounds with small gaps ranging from 0.00% 
to 1.19%. Part of solutions are illustrated in Fig. 2, 3 and 4. 

Table 2. Districting results from the areas HD, GY and ZY 


Area K E CPLEX Hybrid 
LB UB MIPGap Time/s Gap Dev Time/s 

HD 2 5% 0.099 0.099 Opt. 0.99 0.00% 0.00% 0.32 
HD 3 5% 0.084 0.084 Opt. 1.20 0.00% 0.00% 0.44 
HD 4 5% 0.071 0.071 Opt. 1.14 0.00% 0.00% 0.52 
HD 5 5% 0.060 0.060 Opt. 1.31 0.00% 0.00% 0.62 
GY 2 5% 8.149 8.149 Opt. 6,098.57 0.00% 0.00% 6.20 
GY 3 5% 6.085 6.085 Opt. 3,990.47 0.04% 0.06% 6.18 
GY 4 5% 5.135 5.135 Opt. 11,145.66 1.19% 0.03% 7.51 
GY 5 5% 4.616 4.616 Opt. 7,588.53 0.16% 0.07% 9.92 
GY 6 5% 4.185 4.185 Opt. 11,303.26 0.24% 0.37% 6.67 
GY 7 5% 3.751 3.751 Opt. 3,576.36 0.00% 0.00% 6.17 
GY 8 5% 3.529 3.529 Opt. 5,390.92 0.22% 0.09% 6.92 
GY 10 5% 3.120 3.120 Opt. 4,823.14 0.00% 0.01% 7.18 
ZY 2 5% 0.979 0.979 Opt. 3,480.72 0.00% 0.00% 5.24 
ZY 3 5% 0.831 0.831 Opt. 3,796.54 0.00% 0.00% 5.75 
ZY 4 5% 0.717 0.717 Opt. 4,292.85 0.01% 0.04% 5.84 
ZY 5 5% 0.624 0.624 Opt. 4,758.05 0.05% 0.03% 5.80 
ZY 6 5% 0.562 0.562 Opt. 5,811.00 0.08% 0.02% 6.04 
ZY 7 5% 0.513 0.513 Opt. 5,927.70 0.01% 0.01% 6.00 
ZY 8 5% 0.477 0.477 Opt. 4,544.64 0.02% 0.04% 7.32 
ZY 10 5% 0.425 0.425 Opt. 4,973.77 0.02% 0.01% 7.03 
ZY 15 5% 0.351 0.351 - 4,973.77 0.17% 0.10% 9.84 
ZY _ 20 5% 0297 0.297 Opts 539747 029% 019% 15.54 


Fig. 2. Districting maps of the area HD 
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Fig. 3. Districting maps of the area GY 
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Fig. 4. Districting maps of the area ZY 


Districting results from areas GY2 and HN are summarized in Table 3. Due to large numbers of 
basic units in the two areas, the CPLEX failed to solve the instance models with memory overflow errors. 
Among 10 solutions for each instance, the best, average and worst objective are shown in columns Min, 
Avg and Max, respectively. The standard deviation of objectives and the average computing time are 
illustrated in columns Dev and Time, respectively. The results show that the large problems could be 
efficiently solved in 2~9 minutes with very small deviations (<0.18%). Part of districting maps are illus- 
trated in Fig. 5 and 6. 

Table 3. Districting results from areas GY2 and HN 


Area K € Min Avg Max Dev Time/s 
GY2 5 5% 4.439 4.442 4.443 0.03% 115.78 
GY2 10 5% 3.108 3.109 3.110 0.03% 123.96 
GY2 15 5% 2.544 2.548 2.551 0.08% 268.82 
GY2 20 5% 2.174 2.177 2.181 0.11% 165.05 
HN 5 5% 69.762 69.766 69.769 0.00% 302.46 
HN 10 5% 46.548 46.560 46.586 0.02% 330.24 
HN 15 5% 37.562 37.571 37.581 0.02% 546.56 
HN 20 5% 31.781 31.795 31.810 0.03% 404.85 
HN 25 5% 28.291 28.307 28.327 0.04% 539.99 
HN 30 5% 25.854 25.864 25.879 0.03% 340.52 
HN 35 5% 23.764 23.798 23.832 0.09% 323.62 
HN 40 5% 22.244 22.296 22.333 0.13% 498.67 
HN 45 5% 20.760 20.802 20.860 0.18% 388.39 
HN 50 5% 19.541 19.590 19.612 0.12% 410.77 
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Fig. 5. Districting maps of area GY2 


hh 


Fig. 6. Districting maps of area HN 


5 Conclusion 


A mathematical model and a hybrid algorithm were proposed for the EDP in this paper. The MILP model 
is different from existing models. First, it is a center-based model, and thus it is easy to formulate an 
objective function on district compactness. Second, the constraints on district contiguity is simplified by 
the center units of districts. Third, the EDP is defined as an equally capacitated p-median problem with 
constraints of district contiguity. As a result of the center-based formulation, the proposed model could 
optimally solve EDP instances with 324 basic units, while the districting models in [6] and [7] could only 
solve small instances with 16-49 or 25-36 units, respectively. 

The proposed hybrid algorithm is different from existing algorithms in some aspects. The perturbation 
in ILS algorithm plays an important role in escaping from local optima and an appropriate perturbation 
strength could guide the ILS search approaching the global optimum. The algorithm was designed based 
on the standard ILS algorithm, and further enhanced by four schemes: VND local search, population- 
based ILS, center updating, and set partitioning. The experimentation showed that the EDP instances 
could be solved effectively and efficiently. The solutions found by the hybrid algorithm approximate 
optimal solutions or the lower bounds with very small gaps, and are also robust with deviations less than 
0.20%. 
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